Here we present a summary of processing steps on WASH dataset.
Three WASH variables were created as per WHO definition (Damazo). See codebook for variable labels.
cat_watersource
cat_toilettype
cat_garbagedisposal
For every variable, cases with NIU or Missing: Impute were recoded to NA.
Cases which had NA in Gender and Age were completely dropped.
Combined smaller groups to others.
A composite variable was created from the three was variables using logistic PCA.
Centered the Household total expenditure.
For every case, we summed the number of WASH indicators the had access to (max = 3) and calculated the proportion (No sure how to call this rate) Is it possible to model the total as poisson process?
Visualization plots for individual WASH were created but initial modelling is on composite WASH variable.
We also present the result from Generalized Linear Mixed-effect Model using lme4 package (glmer).
The table below summarizes the proportion of missingness for all the variables.
We begin by showing the distribution of individual WASH variables (indicators) over time and space (slum area). Thereafter, we show the distribution of demographic, social and economic variables, of interest, based on composite WASH variable.
The specification for this model was as follows:
\[wash = demographs + social + economics + slum + year + \mathbf{(1 + year|hh\_id)}\]
Random Effect:
hhid_anon:
| (Intercept) | intvwyear | |
|---|---|---|
| (Intercept) | 3.958 | -0.2849 |
| intvwyear | -0.2849 | 0.02327 |
Fixed effects:
Mike and Morgan suggest another way to kind of to multi-variate GLMM by reshaping the dataset to long format (along WASH variables) and then treating the new ‘WASH indicator’ as one of the fixed efffects.
\[wash = (demographs + social + economics + slum + year)*wash\_indicator + \mathbf{(1 + year|hh\_id:wash\_indicator)}\]